-
-
Notifications
You must be signed in to change notification settings - Fork 419
PICARD-2729: Fix/refactor sanitize_date
; fix bug with shifting unknown month
#2722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PICARD-2729: Fix/refactor sanitize_date
; fix bug with shifting unknown month
#2722
Conversation
… YYYY-00-00 => YYYY
dd13241
to
f93efb5
Compare
normalize_date
; fix bug with shifting monthsanitize_date
; fix bug with shifting unknown month
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a lot of more code to handle just few more cases.
Old code was only bothering with "-" separator, new one handles "/" and no separator, Where are those new features used?
picard/util/datestr_util.py
Outdated
y = _clamp_year(int(m.group(1))) | ||
a = int(m.group(2)) | ||
b = int(m.group(3)) | ||
if y in (None, 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this test just after y
declaration since it doesn't depend on a
or b
.
Also it would perhaps be clearer to have:
y = _clamp_year(int(m.group(1))) or None
if y is None:
return ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picard/util/datestr_util.py
Outdated
def _parse_slash_separated(value: str) -> str | None: | ||
if "/" not in value: | ||
return None | ||
tokens = value.split("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expect 2 slashes right? Case with more isn't really handled (1/2/3/4) and is catched by if len(tokens) != 3
below (None is returned)
I think we can simplify things a bit here:
def _parse_slash_separated(value: str) -> str | None:
try:
tokens = value.split("/", 2)
a = _int_or_none(tokens[0])
b = _int_or_none(tokens[1])
c = _int_or_none(tokens[2])
except IndexError:
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picard/util/datestr_util.py
Outdated
c = _int_or_none(tokens[2]) | ||
if a is None or b is None or c is None or not (0 < c <= 9999): | ||
return None | ||
y = _clamp_year(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer:
y = _clamp_year(c) or None
if y is None:
return ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Laurent Monin <lmonin.zas@gmail.com>
They are common formats I've seen in the wild and added for completeness. Does MusicBrainz sanitize them upstream? If so, can be removed. |
AFAIK, picard/picard/formats/apev2.py Line 165 in 57dac02
picard/picard/formats/vorbis.py Line 150 in 57dac02
Line 309 in 57dac02
Line 346 in 57dac02
References: In ISO_8601 it seems hyphens are optional, though: So supporting no hyphen notation is perhaps a good idea but apparently that's only for YYYYMMDD case. It is important to note that current code only support hyphen format (YYYY, YYYY-MM, YYYY-MM-DD) @phw probably knows more about dates & tags, insight welcome. Regarding this patch, I still think that's overcomplicated and it doesn't address the issue at all, as https://tickets.metabrainz.org/browse/PICARD-2729 is about "Allow disabling date sanitazation for APE and Vorbis tags" not about sanitizing more (or differently). We should create a specific ticket for "The cases 2005-00-12" => "2005-12" and "0000-00-12" => "0012" are a bug, though. This should be fixed." and only address this in this PR. |
Right, I planned on addressing that next PR, the new It would close out stale #2406 |
Sorry, only jumping in quickly due to lack of time.
This is the part I don't understand, because this specific issue had been already fixed in PICARD-2712, see PR #2283. Hence I don't understand why this PR claims to fix those. |
My mistake. I thought it hasn't been addressed yet; was looking at the un-closed ticket. Should we close without merging? |
Yes, closing, since the ticket in title isn't actually addressed here anyway (and the other issue was already fixed as said by @phw) |
Summary
Problem
Solution
picard.util.sanitize_date
that preserves unknown components ("00") without shifting, drops only truly unknown trailing parts, and supports: "YYYY", "YYYY-MM", "YYYY-MM-DD", unambiguous "YYYY-DD-MM", "DD/MM/YYYY" vs "MM/DD/YYYY" (heuristic), and compact "YYYYMMDD"/"YYYYDDMM"._parse_pure_year
,_parse_iso_like
,_parse_slash_separated
,_parse_compact_eight
,_format_from_components
.test/test_dateutil_normalize.py
using fixtures andpytest.mark.parametrize
.sanitize_date
docstring (NumPy style) to match behavior and examples.Action
Additional actions required:
Note: Will need a follow-up PR to add option UI's and to close #2406. refactored
sanitize_date
sets up ability to sanitize dates or not.